DATAX121-23A (HAM) & (SEC) - Introduction to Statistical Methods
In T031, we learnt that we could quantify the uncertainty in statistics calculated from a single sample
In the many fields that apply statistical methods, they often want to “measure evidence” given that a hypothesis is true
That is, a statistical test that uses data to judge whether a statement about the population (or process)—where we collected the data from—may be true or not
The null hypothesis, represented by the symbol H0, is a statement that there is “nothing” happening. In most situations, the researcher hopes to disprove or reject the null hypothesis
The alternative hypothesis, represented by the symbol H1 or Ha, is a statement that “something” is happening. In most situations, this hypothesis is what the researcher hopes to prove
— Utts & Heckard (2015)
The structure of these hypothesis statements, logically and mathematically, allows us to examine whether the data provide enough evidence to refute the null, \(H_0\), in support of the alternative \(H_1\)
Do tertiary students spend more than half of their weekly income on rent?1
Let \(p\) be the underlying proportion of weekly income that tertiary students spend on rent
\(\quad H_0\!: p = 0.5\)
\(\quad H_1\!: p > 0.5\)
Do people read faster using a printed book or a Kindle or iPad?
Let \(\mu_\text{Book}\), \(\mu_\text{Kindle}\), and \(\mu_\text{iPad}\) be the underlying mean reading speed of people using a printed book, Kindle, and iPad, respectively
\(\quad H_0\!: \mu_\text{Book} = \mu_\text{Kindle} = \mu_\text{iPad}\)
\(\quad H_1\!: \text{at least one} ~ \mu_i \ne \mu_j\)
One way to understand this mechanism is to consider the sampling distribution of a statistic by defining what we “hypothesise” the parameter to be
This allows us to answer the following two questions:
The p-value (or P-value) is calculated by assuming that the null hypothesis is true, and then determining the probability of a statistic as extreme as, or more extreme than, the observed statistic
Briefly speaking: A probability1 can be defined as the chance of an “event” according to a probability distribution
\(\alpha\), also known as the significance level, is the borderline between when a p-value is “small” enough and when it is not “small” enough
The most common choice is \(\alpha = 0.05 ~ (5\%)\)
In any statistical test, e.g. a hypothesis test, the smaller the p-value, the stronger the evidence is against the null hypothesis, and the stronger the evidence is in favour of the alternative hypothesis
Evidence against the null
| Very strong | Strong | Some | Weak | None | |
|---|---|---|---|---|---|
| p-value | ≤ 0.01 | 0.01 to 0.05 | ≈ 0.05 | 0.05 to 0.10 | > 0.1 |
| If α = 0.05 |
Recall from Slide 6 that:
A difference exists between the choice of \(\neq\) (two-sided) versus \(>\) or \(<\) (one-sided), and for most, if not all, cases, this affects the calculation of the p-value
In practice, most hypothesis tests are two-sided because stating the “direction” of the alternative hypothesis is not always clear when we translate a research question into a set of null and alternative hypotheses
Also, known as the one-sample t-test (for μ)
More on 2.
\[ t_0 = \frac{\bar{x} - \mu_0}{\text{se}(\bar{x})} \]
where:
William S. Gosset, and others, derived the exact probability distribution to model the probability of observing an interval of test statistics
When the test statistic is for the population mean, \(\mu\), we use the Student’s t-distribution to calculate the p-value
The mathematical details relevant for us in DATAX121 is that:
Let \(T\) be the Student’s t-distribution with \(\nu = n - 1\)
If it is a two-sided test, e.g. \(H_1 \! : \mu \neq x\)
\(\quad p\text{-value} = 2 \times \mathbb{P}(T > |t_0|)\)
\(|t_0|\) stands for the absolute value of \(t_0\), which “removes” the sign of a value
For example:
If it is a one-sided test and \(H_1 \! : \mu > x\)
\(\quad p\text{-value} = \mathbb{P}(T > t_0)\)
If it is a one-sided test and \(H_1 \! : \mu < x\)
\(\quad p\text{-value} = \mathbb{P}(T < t_0)\)
lightspeed.df <- read.csv("datasets/lightspeed.csv")
t.test(pass.time ~ 1, data = lightspeed.df, conf.level = 0.95, mu = 24.8296)
One Sample t-test
data: pass.time
t = -0.91633, df = 19, p-value = 0.371
alternative hypothesis: true mean is not equal to 24.8296
95 percent confidence interval:
24.82615 24.83095
sample estimates:
mean of x
24.82855
Recall that the theoretical passage time for Newcomb’s experiment was 24.8296 millionths of a second.
\(H_0\!: \mu = 24.8296\)
\(H_1\!: \mu \neq 24.8296\)
\(p\text{-value} = 2 \times \mathbb{P}(T > |-0.92|)\)
Style One
We do not reject that the underlying mean passage time is 24.8296 millionths of a second at the 5% significance level, in favour of the alternative that it is not 24.8296 millionths of a second (p-value = 0.3710).
Style Two
We have no evidence against the underlying mean passage time being 24.8296 millionths of a second, in favour of the alternative that it is not 24.8296 millionths of a second (p-value = 0.3710).
Critical features
You may had noticed that \(\alpha\) is only a borderline (or threshold) value we use to evaluate the p-value against
It more specifically represents the tolerable probability of making a Type I error. A Type I error is the scenario where we reject the null in favour of the alternative, but it is true in reality
An example of a Type I error is a doctor telling a biological male that “you are pregnant”, when they should had said “you are not pregnant”
Synthetic sample data based on real data from the June quarter 2011 NZ Income Survey1. The survey was an annual snapshot to produce income statistics on New Zealanders aged 15 and over based on a representative sample of the population.
| Variables | |
|---|---|
| ethnicity | A factor denoting the ethnicity with 6 levels |
| region | A factor denoting the region of residence |
| gender | A factor denoting the gender, male or female |
| agegp | A factor denoting the five year age-band. Note that the value 65 describes an individual aged 65 or older |
| qualification | A factor denoting the highest qualtification level with 5 levels |
| occupation | A factor denoting the category of the main income source with 10 levels |
| hours | A number denoting the weekly hours worked from all wages and salary jobs excluding self-employment |
| income | A number denoting gross weekly income from all sources ($) |
We want to test whether the population gross weekly income, in 2011, was greater than $1000
It is generally recommended that adults sleep at least 8 hours each night. A lecturer recently asked some of her students how many hours each had slept the previous night, curious as to whether her students were getting enough sleep.
The 12 students sampled averaged 6.2 hours of sleep with a standard deviation of 1.7 hours. Assuming that this sample meets the assumptions, does this data provide evidence (at the 5% significance level) that her students, on average, are not getting enough sleep?
You use the fact that \(t^\ast_{0.975}(11) = 2.20\)
The exact p-value was 0.0037
What about the 95% confidence interval for the population mean hours of sleep?
. . .
The 12 students sampled averaged 6.2 hours of sleep with a standard deviation of 1.7 hours. Assuming that this sample meets the assumptions, does this data provide evidence (at the 5% significance level) that her students, on average, are not getting enough sleep?
You use the fact that \(t^\ast_{0.975}(11) = 2.20\)
Generally speaking, when there is a two-sided hypothesis test1 and a confidence interval for a parameter. The significance level, \(\alpha\), determines the significance threshold for the p-value and the width of the confidence interval
What about a one-sided hypothesis test?